Studying SVM Method's Scalability Using Text Documents

نویسندگان

  • Daniel Morariu
  • Maria N. Vintan
  • Lucian Vintan
چکیده

In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining good results. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Electronic Document Classification Using Support Vector Machine - An Application for E-Learning

Appplication of the machine learning techniques to embed adaptivity in the E-Learning frameworks is receiving considerable attention. Text Classification, or the task of automatically assigning semantic categories to natural language text, has therefore become one of the key methods for organizing digital content.Reports on SVM have mainly focussed on the theory or conceptual application of the...

متن کامل

Comparative Assessment of the Performance of Three WEKA Text Classifiers Applied to Arabic Text

This research is conducted in order to compare the performance of three known text classification techniques namely, Support Vector Machine (SVM) classifier, Naïve Bayes (NB) classifier, and C4.5 Classifier. Text classification aims to automatically assign the text to a predefined category based on linguistic features, and content. These three techniques are compared using a set of Arabic text ...

متن کامل

FISA: Feature-Based Instance Selection for Imbalanced Text Classification

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative train...

متن کامل

Aspects concerning on the SVM Method’s Scalability

In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that in...

متن کامل

Classification of Text Documents Based on Minimum System Entropy

In this paper, we describe a new approach to classification of text documents based on the minimization of system entropy, i.e., the overall uncertainty associated with the joint distribution of words and labels in the collection. The classification algorithm assigns a class label to a new document in such a way that its insertion into the system results in the maximum decrease (or least increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Scalable Computing: Practice and Experience

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2008